All in !
tidyversesf, terra, leaflet, tmap, dplyrSpatial data are any type of data that directly or indirectly references a specific geographical area or location.
Spatial data combine geospatial coordinates with attributes of those coordinates.
© Fernanda Ochoa
Matrix of cells/pixels that contains each a value.
Useful for continuous phenomena:
Each cell can contain one (e.g., elevation) or multiple attributes (e.g. RGB). Those layers are called “bands”.
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 100 100 101 101 101 101 101 100 100 100
[2,] 101 101 102 102 102 102 102 101 101 101
[3,] 102 102 103 103 103 103 103 102 102 102
[4,] 103 103 104 104 104 104 104 103 103 103
[5,] 104 104 105 105 105 105 105 104 104 103
[6,] 105 105 105 106 106 106 106 105 105 104
[7,] 105 106 106 107 107 107 107 106 106 105
[8,] 106 107 107 108 108 108 108 107 107 106
[9,] 107 108 108 109 109 109 109 108 108 107
[10,] 108 109 109 110 110 110 110 109 109 108
The vector data model represents the world using points, lines and polygons. They are well-defined geometries in a coordinate reference system (CRS).
Useful for discrete phenomena:
Each element (geometries) can be associated with a range of attributes in a data frame.
There are mainly three shapes, so-called geometries, or feature, in the sf framework:
They all have “multi-”counterparts: multipoints, multilines, and multipolygons.
A point is a coordinate in \(n\) dimensions (usually 2).
A linestring is a sequence of points with a straight line connecting the points
A polygon is a sequence of points that form a closed, non-intersecting ring. The first and the last point of a polygon have the same coordinates.
p1 <- rbind(c(0, 0), c(1, 0), c(3, 2), c(2,
4), c(1, 4), c(0, 0))
p2 <- rbind(c(1, 1), c(1, 2), c(2, 2), c(1,
1))
polygon <- st_polygon(list(p1, p2))
multipolygon <- st_multipolygon(list(list(p1,
p2), list(p2 * 2 + 2)))
plot(polygon, axes = TRUE, lwd = 2, main = "POLYGON",
col = "grey")
plot(multipolygon, axes = TRUE, lwd = 2,
main = "MULTIPOLYGON", col = "grey")An sf object includes classical data.frame elements (columns, rows, column names…) and geographic properties (sfc object).
Simple feature collection with 6 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 1568217 ymin: 5518431 xmax: 2089533 ymax: 6191874
Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
Name Island Land_area Population Median_income Sex_ratio
1 Northland North 12500.561 175500 23400 0.9424532
2 Auckland North 4941.573 1657200 29600 0.9442858
3 Waikato North 23900.036 460100 27900 0.9520500
4 Bay of Plenty North 12071.145 299900 26200 0.9280391
5 Gisborne North 8385.827 48500 24400 0.9349734
6 Hawke's Bay North 14137.524 164000 26100 0.9238375
geom
1 MULTIPOLYGON (((1745493 600...
2 MULTIPOLYGON (((1803822 590...
3 MULTIPOLYGON (((1860345 585...
4 MULTIPOLYGON (((2049387 583...
5 MULTIPOLYGON (((2024489 567...
6 MULTIPOLYGON (((2024489 567...
One can get the data.frame part like this:
Name Island Land_area Population Median_income Sex_ratio
1 Northland North 12500.5611 175500 23400 0.9424532
2 Auckland North 4941.5726 1657200 29600 0.9442858
3 Waikato North 23900.0364 460100 27900 0.9520500
4 Bay of Plenty North 12071.1447 299900 26200 0.9280391
5 Gisborne North 8385.8266 48500 24400 0.9349734
6 Hawke's Bay North 14137.5244 164000 26100 0.9238375
7 Taranaki North 7254.4804 118000 29100 0.9569363
8 Manawatu-Wanganui North 22220.6084 234500 25000 0.9387734
9 Wellington North 8048.5528 513900 32700 0.9335524
10 West Coast South 23245.4559 32400 26900 1.0139072
11 Canterbury South 44504.4991 612000 30100 0.9753265
12 Otago South 31186.3092 224200 26300 0.9511694
13 Southland South 31196.0604 98300 29500 0.9785069
14 Tasman South 9615.9760 51100 25700 0.9718981
15 Nelson South 422.1952 51400 27200 0.9259674
16 Marlborough South 10457.7455 46200 27900 0.9577922
A coordinate reference system (CRS) is a framework that defines how locations on Earth’s surface are mathematically represented using coordinates.
The two main types are geographic and projected coordinate systems.
Geographic coordinate systems use angular measurements (latitude and longitude) to describe locations directly on Earth’s curved surface, typically measured in degrees from a reference point like the equator and prime meridian.
Projected coordinate systems, on the other hand, use mathematical transformations to convert the curved Earth onto a flat plane, resulting in coordinates expressed in linear units like meters or feet - this process inevitably introduces some distortion but allows for easier measurement and analysis on flat maps.
So we will focus on vector data.
Most CSS scholars use vector data.
sf is a data.frame + sfc, so most (if not all) basic operations that can be done on a data.frame can be done in a sf object.
# Subset
south_provinces <- nz[nz$Island == "South", ]
# Union
south_nz <- st_union(south_provinces)
# Join / intersection
nz_height_south <- nz_height[south_nz, ]
#Same: st_intersection(nz_height, south_nz)
#Or, st_filter(nz_height, south_nz)
# Plot
ggplot() +
geom_sf(data = south_provinces) +
geom_sf(data = nz_height_south, shape = 2, col = "red") +
theme_minimal() +
coord_sf()A lot of other join possibilities: st_intersects, st_touches, st_overlaps, st_contains, st_contains_properly, st_covers, st_within, st_covered_by, st_disjoint.
#Distance matrix
Units: [m]
[,1] [,2] [,3]
[1,] 0.00 30627.85 31795.56
[2,] 30627.85 0.00 1266.53
[3,] 31795.56 1266.53 0.00
#Nearest feature
[1] 13 12 12 12 11 11 11 10 10 10
A lot of other possibilites! see the list of functions here.
How to link spatial data with CSS?
Increasing availability of fine-grained, large-scale geographical data.
Allows to extend standard research with new possibilities:
Estimate poverty with satellite imagery.
Scarce data in developing countries: hard to assess geographical variation in poverty or affluence.
Use neural network with satellite data to predict poverty.
Validation with survey data from some countries: the predictions explain up to 75% of the variation in local-level economic outcomes.
Open access data, scalable, low-cost.
Use Twitter data to retrieve mobility patterns in the U.S.
RQ: How segregated are the mobility patterns of Americans?
Estimate the so-called segregated mobility index (SMI) to estimate segregation at the neighborhood level. Contact between neighborhoods.
“The racial segregation of a city becomes the extent to which residents fail to travel to different types of neighbourhoods with varying racial/ethnic compositions, controlling for the racial composition of a city’s neighbourhoods.”
Use 133,766,610 geotagged tweets from 375,504 individuals. Retrieve the place of residence by checking evening and early-morning tweets’ location.
The authors find that segregation goes beyond the place of residence, even though residential segregation is a key predictor of the SMI.
Check the workshop_spatial.Rmd file.
Enjoy!
We have until 12.00-ish.
Spatial Data